Using the Mutual k-Nearest Neighbor Graphs for Semi-supervised Classification on Natural Language Data
نویسندگان
چکیده
The first step in graph-based semi-supervised classification is to construct a graph from input data. While the k-nearest neighbor graphs have been the de facto standard method of graph construction, this paper advocates using the less well-known mutual k-nearest neighbor graphs for high-dimensional natural language data. To compare the performance of these two graph construction methods, we run semi-supervised classification methods on both graphs in word sense disambiguation and document classification tasks. The experimental results show that the mutual k-nearest neighbor graphs, if combined with maximum spanning trees, consistently outperform the knearest neighbor graphs. We attribute better performance of the mutual k-nearest neighbor graph to its being more resistive to making hub vertices. The mutual k-nearest neighbor graphs also perform equally well or even better in comparison to the state-of-the-art b-matching graph construction, despite their lower computational complexity.
منابع مشابه
A Comparison of Graph Construction and Learning Algorithms for Graph-Based Phonetic Classification
Graph-based semi-supervised learning (SSL) algorithms have been widely applied in large-scale machine learning. In this work, we show different graph-based SSL methods (modified adsorption, measure propagation, and prior-based measure propagation) and compare them to the standard label propagation algorithm on a phonetic classification task. In addition, we compare 4 different ways of construct...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملImproved Nearest Neighbor Methods For Text Classification With Language Modeling and Harmonic Functions
We present new nearest neighbor methods for text classification and an evaluation of these methods against the existing nearest neighbor methods as well as other well-known text classification algorithms. Inspired by the language modeling approach to information retrieval, we show improvements in k-nearest neighbor (kNN) classification by replacing the classical cosine similarity with a KL dive...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملImproved Nearest Neighbor Methods For Text Classification
We present new nearest neighbor methods for text classification and an evaluation of these methods against the existing nearest neighbor methods as well as other well-known text classification algorithms. Inspired by the language modeling approach to information retrieval, we show improvements in k-nearest neighbor (kNN) classification by replacing the classical cosine similarity with a KL dive...
متن کامل